Short-Text Similarity Measurement Using Word Sense Disambiguation and Synonym Expansion

نویسندگان

  • Khaled Abdalgader
  • Andrew Skabar
چکیده

Measuring the similarity between text fragments at the sentence level is made difficult by the fact that two sentences that are semantically related may not contain any words in common. This means that standard IR measures of text similarity, which are based on word co-occurrence and designed to operate at the document level, are not appropriate. While various sentence similarity measures have been recently proposed, these measures do not fully utilise the semantic information available from lexical resources such as WordNet. In this paper we propose a new sentence similarity measure which uses word sense disambiguation and synonym expansion to provide a richer semantic context to measure sentence similarity. Evaluation of the measure on three benchmark datasets shows that as a stand-alone sentence similarity measure, the method achieves better results than other methods recently reported in the literature.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بررسی نقش انواع بافتار هم‌نویسه‌ها در تعیین شباهت بین مدارک

Aim: Automatic information retrieval is based on the assumption that texts contain content or structural elements that can be used in word sense disambiguation and thereby improving the effectiveness of the results retrieved. Homographs are among the words requiring sense disambiguation. Depending on their roles and positions in texts, homograph contexts could be divided to different types, wit...

متن کامل

Improve Lexicon-based Word Embeddings By Word Sense Disambiguation

There have been some works that learn a lexicon together with the corpus to improve the word embeddings. However, they either model the lexicon separately but update the neural networks for both the corpus and the lexicon by the same likelihood, or minimize the distance between all of the synonym pairs in the lexicon. Such methods do not consider the relatedness and difference of the corpus and...

متن کامل

PUTOP: Turning Predominant Senses into a Topic Model for Word Sense Disambiguation

We extend on McCarthy et al.’s predominant sense method to create an unsupervised method of word sense disambiguation that uses automatically derived topics using Latent Dirichlet allocation. Using topicspecific synset similarity measures, we create predictions for each word in each document using only word frequency information. It is hoped that this procedure can improve upon the method for l...

متن کامل

Merging Word Senses

WordNet, a widely used sense inventory for Word Sense Disambiguation(WSD), is often too fine-grained for many Natural Language applications because of its narrow sense distinctions. We present a semi-supervised approach to learn similarity between WordNet synsets using a graph based recursive similarity definition. We seed our framework with sense similarities of all the word-sense pairs, learn...

متن کامل

Lexical ambiguity and Information Retrieval revisited

A number of previous experiments on the role of lexical ambiguity, in Information Retrieval are reproduced on the'IR-Semcor test collection (derived from Semcor), where both queries and documents are hand-tagged ;with phrases, Part-Of-Speech and WordNet 1.5 senses. Our results indicate that a) Word Sense Disambiguation can be more beneficial to Information Retrieval than the experiments of Sand...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010